The problem of adversarial defenses for image classification, where the goal is to robustify a classifier against adversarial examples, is considered. Inspired by the hypothesis that these examples lie beyond the natural image manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is proposed to remove adversarial perturbations by localized manifold projections. DISCO consumes an adversarial image and a query pixel location and outputs a clean RGB value at the location. It is implemented with an encoder and a local implicit module, where the former produces per-pixel deep features and the latter uses the features in the neighborhood of query pixel for predicting the clean RGB value. Extensive experiments demonstrate that both DISCO and its cascade version outperform prior defenses, regardless of whether the defense is known to the attacker. DISCO is also shown to be data and parameter efficient and to mount defenses that transfers across datasets, classifiers and attacks.
translated by 谷歌翻译
对象检测器的复杂性过度权衡是资源约束视觉任务的关键问题。先前的作品强调了用有效的骨干实现的检测器。在这项工作中,研究了对检测负责人对提案处理的这种权衡的影响。假设提高的检测效率需要范式转移,朝着不平等的建议处理,将更多的计算分配给良好的建议,而不是贫穷的建议。这可以更好地利用可用的计算预算,从而为同一失败提供了更高的精度。我们将其作为一个学习问题提出,目的是将操作员分配给检测头的建议,以便将总计算成本受到限制,并且精确度最大。关键发现是,可以将这种匹配作为一个函数,该函数将每个提案嵌入到操作员的单速代码中。尽管此功能诱导了复杂的动态网络路由机制,但它可以由简单的MLP实现,并通过现成的对象检测器端到端学习。这种“动态建议处理”(DPP)显示出明确的计算复杂性的明确余量,表现出优于最先进的端到端对象检测器(DETR,稀疏R-CNN)。
translated by 谷歌翻译
需要了解和预测车辆的行为是运输领域中的公共和私人目标的基础,包括城市规划和管理,乘车共享服务以及智能运输系统。个人的喜好和预期目的地在整天,周和年中各不相同:例如,酒吧在晚上最受欢迎,海滩在夏天最受欢迎。尽管有这一原则,我们注意到葡萄牙波尔图的流行基准数据集的最新研究充其量只能通过纳入时间信息来提高预测性能的边际改善。我们提出了一种基于HyperNetworks的方法,该方法是一种元学习的变体(“学习学习”),其中神经网络学会根据输入来改变自己的权重。在我们的情况下,负责目标预测的权重有所不同,尤其是输入轨迹的时间。时间条件的权重显着改善了模型相对于消融研究和可比较的工作的误差,我们证实了我们的假设,即时间知识应改善对车辆预期目的地的预测。
translated by 谷歌翻译
在从少数类(基类)开始的情况下,已经广泛研究了课堂学习学习(CIL)。取而代之的是,我们探索了一个研究不足的CIL现实环境,该设置是从在大量基类中进行预训练的强大模型开始。我们假设强大的基本模型可以为新颖的类别提供良好的表示,并且可以通过小型适应来进行渐进的学习。我们提出了一个2阶段的训练方案,i)功能增强 - 将部分的克隆部分克隆并在新型数据上进行微调,ii)融合 - 将基础和新型分类器组合到统一的分类器中。实验表明,所提出的方法在大型成像网数据集上的最先进的CIL方法明显优于最先进的CIL方法(例如,总体准确度 +最佳 +最佳精度为10%)。我们还建议和分析研究研究的实际CIL方案,例如与分布转移的基础新颖性重叠。我们提出的方法是鲁棒的,并概括了所有分析的CIL设置。代码可从https://github.com/amazon-research/sp-cil获得。
translated by 谷歌翻译
In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. To facilitate future research, two implementations are made available at https://github.com/zhaoweicai/cascade-rcnn (Caffe) and https://github.com/zhaoweicai/Detectron-Cascade-RCNN (Detectron).
translated by 谷歌翻译
Domain adaptation for semantic image segmentation is very necessary since manually labeling large datasets with pixel-level labels is expensive and time consuming. Existing domain adaptation techniques either work on limited datasets, or yield not so good performance compared with supervised learning. In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation. Using the bidirectional learning, the image translation model and the segmentation adaptation model can be learned alternatively and promote to each other. Furthermore, we propose a self-supervised learning algorithm to learn a better segmentation adaptation model and in return improve the image translation model. Experiments show that our method is superior to the state-of-the-art methods in domain adaptation of segmentation with a big margin. The source code is available at https://github.com/liyunsheng13/BDL.
translated by 谷歌翻译
In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of the input hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is proposed to address these problems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives. The detectors are trained stage by stage, leveraging the observation that the output of a detector is a good distribution for training the next higher quality detector. The resampling of progressively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reducing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset. Experiments also show that the Cascade R-CNN is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. The code will be made available at https://github.com/zhaoweicai/cascade-rcnn.
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Stress has a great effect on people's lives that can not be understated. While it can be good, since it helps humans to adapt to new and different situations, it can also be harmful when not dealt with properly, leading to chronic stress. The objective of this paper is developing a stress monitoring solution, that can be used in real life, while being able to tackle this challenge in a positive way. The SMILE data set was provided to team Anxolotl, and all it was needed was to develop a robust model. We developed a supervised learning model for classification in Python, presenting the final result of 64.1% in accuracy and a f1-score of 54.96%. The resulting solution stood the robustness test, presenting low variation between runs, which was a major point for it's possible integration in the Anxolotl app in the future.
translated by 谷歌翻译
This article presents a dataset of 10,917 news articles with hierarchical news categories collected between January 1st 2019, and December 31st 2019. We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.
translated by 谷歌翻译